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EXCHANGEABLE PAIRS 
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Abstract. Since the introduction of Stein's method in the early 1970s, much research has 
been done in extending and strengthening it; however, there does not exist a version of 
Stein's original method of exchangeable pairs for multivariate normal approximation. The 
aim of this article is to fill this void. We present three abstract normal approximation 
theorems using exchangeable pairs in multivariate contexts, one for situations in which 
the underlying symmetries are discrete, and real and complex versions of a theorem for 
situations involving continuous symmetry groups. Our main applications are proofs of the 
approximate normality of rank k projections of Haar measure on the orthogonal and unitary 
groups, when k = o(n). 



1. Introduction 

Stein's method was introduced by Charles Stein [JT] as a tool for proving central limit the- 
orems for sums of dependent random variables. Stein's version of his method, best known as 
the "method of exchangeable pairs" , is described in detail in his later work [12] . The method 
of exchangeable pairs is a general technique whose applicability is not restricted to sums of 
random variables; for some recent examples, one can look at the work of Jason Fulman [20] 
on central limit theorems for complicated objects arising from the representation theory of 
permutation groups, and the work of the second-named author [34] on the distribution of 
eigenfunctions of the Laplacian on Riemannian manifolds. 

One of the significant advantages of the method is that it automatically gives concrete 
error bounds. Although Stein's original theorem does not generally give Kolmogorov distance 
bounds of the correct order, there has been substantial research on modifications of Stein's 
result to obtain rate-optimal Berry-Esseen type bounds (see e.g. the works of Rinott & Rotar 
[315] and Shao & Su |40j). The "infinitesimal" version of the method described in [32] and in 
our Theorems [5] and [6] below frequently does produce bounds of the correct order, in total 
variation distance in the univariate case and in Wasserstein distance in the multivariate case. 

Heuristically, the method of exchangeable pairs for univariate normal approximation goes 
as follows. Suppose that a random variable W is conjectured to be approximately a standard 
Gaussian. The first step in the method is to construct a second random variable W on the 
same probability space such that (W, W) is an exchangeable pair, i.e. (W, W) has the same 
distribution as (W, W). The random variable W is generally constructed by making a small 
random change in W, so that W and W are close. 
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Let A = W — W . The next step is to verify the existence of a small number A such that 

(1) E(A | W) = XW + 7*1, 

(2) E(A 2 | W) = 2A + r 2 , and 

(3) E|A| 3 = r 3 , 

where the random quantities ri,r 2 , and r 3 are all negligible compared to A. If the above 
relations hold, then, depending on the sizes of A and the ?*j's, one can conclude that W 
is approximately Gaussian. The exact statement of Stein's abstract normal approximation 
theorem for piecewise differentiable test functions is the following: 

Theorem 1 (Stein [32], page 35). Let (W,W) be an exchangeable pair of real random 
variables such that EW 2 = 1 and E[W - W | W] = XW for some < A < 1. Let 
A = W — W . Let h : R — > R be bounded with piecewise continuous derivative h! . Then for 
Z a standard normal random variable, 

\Eh(W)-Eh(Z)\ < l|fr- E M^)lloe ^ Var(E [ A2 l I y]) + M^E|A| 3 . 



Observe that the condition E[W - W'\W] = XW implies that EA 5 
bound in Stein's theorem above can also be stated as: 



2 A, and thus the 



\Eh(W) - Eh{Z) I <2||/i-E/i(Z)|| c 



-E [A*\W] 
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Powerful as it is, the above theorem and all its existing modifications cater only to uni- 
variate normal approximation. There has been some previous work in proving multivariate 
central limit theorems using Stein's method, though none of these approaches have used ex- 
changeable pairs. In 1996, Rinott & Rotar [38J proved multivariate central limit theorems for 
sums of dependent random vectors using the dependency graph version of Stein's method. 
Around the same time, Goldstein & Rinott [22] developed the size-bias coupling version of 
Stein's method for multivariate normal approximation. Both of these techniques are well- 
known and in regular use. More recently, Raic [36] proved a new multivariate central limit 
theorem for sums of dependent random vectors with the dependency graph approach which 
removed the need for finite third moments. However, as in the univariate case, there are 
many problems which are more amenable to analysis via exchangeable pairs (particularly 
the adaptation to the case of continuous symmetries) which necessitates the creation of a 
multivariate version of this method. The present authors introduced, for the first time, a 
multivariate version of Theorem [1] in an earlier draft of this manuscript that was posted on 
arXiv. Subsequently, an extension of one of our main results (Theorem H]) to the case of 
multivariate normal approximation with non-identity covariance was formulated by Reinert 
and Rollin [37]. Our current draft is mainly a reorganization of the original manuscript, with 
better error bounds in several examples. Let us refer to the Reinert-Rollin paper [37J for 
many other interesting applications. 

The contents of this paper are as follows. In Section [2J we prove three abstract normal 
approximation theorems which give a framework for using the method of exchangeable pairs 
in a multivariate context. The first is for situations in which the symmetry used in con- 
structing the exchangeable pair is discrete, and is a fairly direct analog of Theorem [1] above. 
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An an example, the theorem is applied in Section [3] to prove a basic central limit theorem 
for a sum of independent, identically distributed random vectors. 

The second abstract theorem of Section [2] includes an additional modification, making it 
useful in situations in which continuous symmetries are present. The idea for the modifica- 
tion was introducted by Stein in [13] and further developed in [32] . Section [4] contains two 
applications of this theorem. First, for Y a random vector in IR n with spherically symmetric 
distribution, sufficient conditions are given under which the first k coordinates are approxi- 
mately distributed as a standard normal random vector in M. k . We then give a treatment of 
projections of Haar measure on the orthogonal group. Specifically, for M a random n x n 
orthogonal matrix and A\, . . . , fixed matrices over M., we give an explicit bound on the 
Wasserstein distance between (Tr (AiM), . . . , Tr (A^M)) and a Gaussian random vector. 

As a corollary to the theorem discussed above, we state a theorem for bounding the 
distance between a complex random vector and a complex Gaussian random vector, in the 
context of continuous groups of symmetries. The main application of this version of the 
theorem in given in Section 01 where for M a random n x n unitary matrix and A\, . . . , A n 
fixed matrices over C, we derive an explicit bound on the Wasserstein distance between 
(Tr (AiM), . . . , Tr (AkM)) and a complex Gaussian random vector. 

Before moving into Section [21 we give the following very brief outline of the literature 
around the various other versions of Stein's method. 

Other versions of Stein's method. The three most notable variants of Stein's method are 
(i) the dependency graph approach introduced by Baldi and Rinott [3] and further developed 
by Arratia, Goldstein and Gordon [3] and Barbour, Karohski, and Rucihski [7J, (ii) the size- 
biased coupling method of Goldstein and Rinott [23] (see also Barbour, Hoist and Janson [5]), 
and (iii) the zero-biased coupling technique due to Goldstein and Reinert [21]. In addition 
to these three basic approaches, an important contribution was made by Andrew Barbour 
[6], who noticed the connection between Stein's method and diffusion approximation. This 
connection has subsequently been widely exploited by practitioners of Stein's method, and 
is a mainstay of some of our proofs. 

Besides normal approximation, Stein's method has been successfully used for proving 
convergence to several other distributions as well. Shortly after the method was introduced 
for normal approximation by Stein, Poisson approximation by Stein's method was introduced 
by Chen [TJ] and became popular after the publication of [2j [3]. The method has also been 
developed for gamma approximation by Luk [31]; for chi-square approximation by Pickett 
|35j ; for the uniform distribution on the discrete circle by Diaconis [27]; for the semi-circle 
law by Gotze and Tikhomirov [28]; for the binomial and multinomial distributions by Holmes 
[29] and Loh [30]; and the hypergeometric distribution, also by Holmes [29|. 

The method of exchangeable pairs was extended to Poisson approximation by Chatterjee, 
Diaconis and Meckes in the survey paper |13j . and to a general method of normal approxi- 
mation for arbitrary functions of independent random variables in |12j . 

For further references and exposition (particularly to the method of exchangeable pairs), 
we refer to the recent monograph [18J. 

1.1. Notation and conventions. The total variation distance drvi^i v) between the mea- 
sures \i and v on E is defined by 

d T v{^, v) = sup \n(A) - v(A)\, 

A 
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where the supremum is over measurable sets A. This is equivalent to 

drv(p>, v) = ^sup J f{t)d^{t) - J f{t)du(t) 

where the supremum is taken over continuous functions which are bounded by 1 and vanish 
at infinity; this is the definition most commonly used in what follows. The total variation 
distance between two random variables X and Y is defined to be the total variation distance 
between their distributions: 

d TV {X,Y) = sup|P(X e A) - F(Y e A)\ = - sup \Ef(X) - Ef(Y) I . 

A 2 f 

If the Banach space of signed measures on R is viewed as dual to the space of continuous 
functions on IR vanishing at infinity, then the total variation distance is (up to the factor of 
|) the norm distance on that Banach space. 

The Wasserstein distance dw{X, Y) between the random variables X and Y is defined by 

d w (X,Y)= sup \Eg(X)-Eg(Y)\, 

M x {g)<\ 

where Mx(g) = sup^.^ ^uzf| is the Lipschitz constant of g. Note that Wasserstein dis- 
tance is not directly comparable to total variation distance, since the class of functions 
considered is required to be Lipschitz but not required to be bounded. In particular, to- 
tal variation distance is always bounded by 1, whereas the statement that the Wasserstein 
distance between two distributions is bounded by 1 has content. On the space of probabil- 
ity distributions with finite absolute first moment, Wasserstein distance induces a stronger 
topology than the usual one described by weak convergence, but not as strong as the topol- 
ogy induced by the total variation distance. See [TH] for detailed discussion of the various 
notions of distance between probability distributions. 

We will use a 2 ) to denote the normal distribution on R with mean /i and variance 
a 2 ; unless otherwise stated, the random variable Z = (Z\, . . . , Zk) is understood to be a 
standard Gaussian random vector on IR fc . 

In M n , the Euclidean inner product is denoted (■,■) and the Euclidean norm is denoted 
| ■ |. On the space of real (resp. complex) n x n matrices, the Hilbert-Schmidt inner product 
is defined by 

(A B) HS = Tr (AB T ), (resp. (A, B) HS = Tr (AB*)) 

with corresponding norms 

\\A\\ H . S . = V^(AA T ), (resp. \\A\\ H . S . = ^(^4*)) . 

The operator norm of a matrix A over M is defined by 

\\A\\op = sup \(Av,w)\. 

|t>|=l,|iu|=l 

The n x n identity matrix is denoted I n , the n x n matrix of all zeros is denoted n , and 
A © B is the block direct sum of A and B. 

For fl a domain in the notation C k (Q) will be used for the space of fc-times continuously 
differentiable real-valued functions on Q, and C^(Q) C C k (Q) are those C k functions on Q 
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with compact support. For g : R k — > R, let 



Mi(g) := sup 



l^fa) -g(y)\ 



x^y \x y\ 
i£ge C 1 (R k ) also, then let 

M 2 (#) := sup 1 , ; 

*H \x-y\ 

ii g E C 2 (R k ) as well, then 



M 3 (g) := sup 



|Hess5f(x) - Hess#( y , 



x+y \x-y\ 

The last definition differs from the one in [36], where M 3 is defined in terms of the Hilbert- 
Schmidt norm as opposed to the operator norm. Note that if g G C 1 ^*), then M x {g) = 

op ■ 



sup x |Vg(x)|, and if g G C 2 (lR fc ), then M 2 (g) = sup x ||Hess5 , (x 



2. TWO ABSTRACT NORMAL APPROXIMATION THEOREMS 

In this section we develop the general machine that will be applied in the examples in 
Sections [3] and HI In the following, we use the notation & {X) to denote the law of a random 
vector or variable X. The following lemma gives a second-order characterizing operator for 
the Gaussian distribution on IR fc . 

Lemma 2. Let Z G R k be a random vector with {Zi} k =1 independent, identically distributed 
standard Gaussians. 

(i) If / : R k — > R is two times continuously differentiate and compactly supported, then 

E[A/(Z)-(Z,V/(Z)>] =0. 

(ii) IfY£ R k is a random vector such that 

E[Af(Y)-(Y,Vf(Y)}]=0 

for every f G C 2 (R k ) with E|A/(F) - (Y, V/(Y)} | < oo, then L(Y) = L{Z). 

(iii) If g G C™(R k ), then the function 

U o9 (x) := / - [Eg(Vtx + y/l=tZ) - Eg(Z)] dt 
Jo 

is a solution to the differential equation 
(4) Ah(x) - (x, Vh(x)} = g(x) - Eg(Z). 

Remark. The form of U Q g is a direct rewriting of the inverse of the Ornstein-Uhlenbeck 
generator (see Barbour [5]). 



Proof. Part (i) is just integration by parts. 

Part (ii) follows easily from part (iii): note that if 

E[Af(Y)-(Y,Vf(Y)}} =0 

for every / G C 2 (R k ) with E| Af(Y) - (Y, V/(Y)) | < oo, then for g G C °° given, 

Eg(Y)-Eg(Z) = E[A{U g)(Y) - (Y,V(U g)(Y))} = 0, 



SOURAV CHATTERJEE AND ELIZABETH MECKES 



and so £>(Y) = H{Z) since C%° is dense in the class of bounded continuous functions vanishing 
at infinity, with respect to the supremum norm. 

A proof of part (iii) is given in [6], [21] and [36], all using results about Markov semi-groups. 
For a direct proof, see [32] . 

□ 

The next lemma gives useful bounds on U Q g and its derivatives in terms of g and its 
derivatives. As in [36J, bounds are most naturally given in terms of the quantities Mi(g) 
defined in the introduction. 



Lemma 3. For g : M fc 

(i) 

(ii) 



given, U g satisfies the following bounds: 



sup \\B.ess U g(x)\\ H .s. < M^g). 

x£R k 



2tt 



M 3 (U o9 ) < ^M 2 (g). 
Proof. Write h(x) = U Q g(x) and Z xt = y/ix + \/l — tZ. Note that by the formula for U Q g, 



(5) 



d r h 



x 



(2t)" 1 f/ 2 E 



d r g 



dx{ 1 • • • dxi r 



dx ix ■ ■ ■ dx 

It follows by integration by parts on the Gaussian expectation that 

d 2 h 



x.t) 



dt. 



(6) 



dxidxj 



x 



o 2 
i 



-E 



J-i-(Vix + VT^tz) 

OXiOXj 



dt 



1 



and so 

(7) Hess h(x) = 

Fix a k x k matrix A. Then 

(Hess h(x), A) H s 

thus 



o ly/T^t 



1 



E 



Zi §k {Zx ' t] 



dt, 



2y/l^t 



=E 



Z(Vg(Z x , t )) 



T 



dt. 



o 2VT^t 



E[(A T Z, Vg(Z X)t ))]dt, 



r 1 i 

\(Eessh(x),A) HS \ < M 1 {g)E\A T Z\ / dt = M^g^A? Z\ 

Jo 2yl — t 



If A = [aij]!? , =1 , then 



E\AZ\ < \/E\AZ\ 2 

and thus 

for all iGl', hence part 



\ i=i \j=i 



\Ressh(x)\\ H . s . < M^g) 



E4 = Nk 
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For part (iij let u and v be fixed vectors in ~R k with \u\ = \v\ = 1. Then it follows from 
dZ]) that 

((Hess h(x) -Hess h{y))u,v) = / E [(Z, v) (Vg(Z x<t ) - Vg{Z y<t ),u)} dt, 

Jo 2y 1 — t 



| ((Hess h(x) - Hess h(y))u, v)\ < \x-y\ M 2 {g) E| (Z, v) | / rft 



and so 



I -m jf / \ V27T 

= |x -y\ M 2 {g)—-, 

since (Z, u) is just a standard Gaussian random variable. This completes the proof of part 

□ 



There is an important difference in the behavior of solutions to the Stein equation 



in 



in 



the context of multivariate approximation versus univariate approximation. In the univariate 



case, one can replace the expression on the left-hand side of (iii) with the first-order expression 



h'(x) — xh(x); the function g(x) = U h(x) which solves the differential equation 

h'(x) — xh(x) = g(x) — Eg(Z) 

satisfies the bounds (see l4"2l) 



N|oo<y|||/i-E/i(^)||oo M l (g)<2\\h-Eh(Z)\\ 00 M 2 (g) < 2M 1 (h), 

and the fact that the differential equation is first order rather than second then allows for 
reducing the degree of smoothness needed by one, over what is required in the multivariate 



case. Alternatively, one can use the same expression as in (iii) above; in this case, M 3 (g) < 
2M±(g) (see [36J), also decreasing by one the degree of smoothenss needed. This improvement 
allowed the univariate version [33] of Theorem [121 below, on the approximation of projections 
of Haar measure on the orthogonal group by Gaussian measure, to be proved in total variation 
distance as opposed to Wasserstein distance. 

This improvement is not possible in the multivariate case; it can be shown, for example 
(see [3S]), that if 

f(x, y) = max{min{x, y}, 0}, 
then U f defined as in Lemma [2] is twice different iable but 9 is not Lipschitz. 



Theorem 4. Let X and X' be two random vectors in M. k such that &(X) = £j(X'), and let 
Z = (Zi, . . . , Zk) G M fc be a standard Gaussian random vector. Suppose there is a constant 
A such that 

(8) — E [X' -X\X] = -X. 

A 

Define the random matrix E by 

(9) J-E [(X' - X)(X' - X) T \X] = a 2 h + E [E\X] . 
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Then if g e C 2 {R k ) with Mi(g) < oo and M 2 {g) < oo, 



(10) 



\Eg(X) - Eg(aZ)\ < ^-M 1 {g)E\\E\\ H . s . + ( ^ I ^-E\X> - X\\ 



Proof. Fix g, and let i7 g be as in Lemma[2j Note that it suffices to assume that g G C°°(IR fc ): 
let h : M, k — > M be a centered Gaussian density with covariance matrix e 2 /^- Approximate g 
by g * h] clearly \\g * /i — — > as e — ► 0, and by Young's inequality, Mi(<? * h) < Mi(g) 
and M 2 (# * /i) < M 2 (#). 

Note also that if /(x) = g(crx), then |E#(A") - Eg(aZ) \ = |l/(o- -1 X) - Ef{Z) | . It is easy 
to see that Mi(/) = aMi(g) and M 2 (f) = a 2 M 2 (g). It thus follows from the theorem for 
cr = l that 



|E<7pO -E(7((7Z)| < ffM ( 5 )E||(7- 2 E|| H; 



M!( 5 ) ^M 2 (p) 



(7 24aA 
we therefore restrict our attention to the case a = 1. 
For notational convenience, write h(x) = U g(x). Then 



A 

ElX'-Xl 3 ; 







-E [h(X') - h(X)} 



A 

E 



-E 



(X' - X, Vh(X)) + -{X'- AT) T (Hess h{X))(X' - X) + R 



(X, Vh(X)) + Ah(X) + (E, Hess h(X)) H 



R 



'ID 



Eg(X) — Eg{Z) + E 



(E,Ressh(X)) H ^ 



R 



where R is the error in the second-order expansion. By an alternate form of Taylor's theorem 
(see [HI), 



Furthermore, 



, , M 3 (h) , , ,, V2nM 2 (q) , , „ 
E R < —^-Le\X' - X\ 3 < - — r-^ElX' - XI 3 



6 



21 



E\{E,Ressh(X))\ < sup \\Eessh(y)\\ H . s . E\\E\\ H . S . < M 1 (g)E\\E\\ H _ ) 



This completes the proof. 



□ 



Remarks. 

(i) Usually the X and X' of the theorem will make an exchangeable pair, but this is 
not required for the proof. 

(ii) The coupling assumed in (JSJ) implies that EX = 0. It is not required that X have a 
scalar covariance matrix, however, it follows from (jSJ) and that 

E[E] =E[ATA: T ] -a 2 I k . 
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It should therefore be the case that the covariance matrix of X is not too far from 
o 2 h. 



The following is a continuous analog of Theorem |H A univariate version which gives 
approximation in total variation distance was proved in [33]. As was noted following the 
proof of Lemma [31 a bound on total variation distance in the multivariate context is not 
possible with the method used here because of the difference in the behavior of solutions to 
the Stein equation in the multivariate context. 

Theorem 5. Let X be a random vector in M, k and for each e > let X e be a random vector 
such that £->(X) = £(X £ ), with the property that \im e ^,oX e = X almost surely. Let Z be a 
normal random vector in IR fc with mean zero and covariance matrix a 2 Ik- Suppose there is a 
function A(e) and a random matrix F such that the following conditions hold. 

(i) 



1 



■E [(X e -X)i\X] 



-X. 



— [(X e - X)(X e - X) T \X] a H k + E [F\X] . 



(hi) For each p > 0, 



lim— -E 

c->o A(e) 



\X e - X\ I(\X £ - X\ 2 > p) 



0. 



Then 
(12) 



d w (X,Z) < -E\\F\\ H 
a 



Proof. Fix a test function g; as in the proof of Theorem HI it suffices to assume that g G 
C°°(M. k ) and to consider only the case a — 1; the general result follows exactly as before. 
Let U a g be as in Lemma [2J and as before, write h(x) = U g(x). Observe 







(13) 



1 

A(7) 



-E [h(X, 
-E 



h(X)} 



(X e - X, Vh(X)) + -(X e - X) T (Hess h(X))(X e - X) + R 



where R is the error in the second-order approximation of h(X e )—h(X). By Taylor's theorem, 
there is a constant K (depending on h) and a function S with 8(x) < K min{x 2 , x 3 } , such 
that < $(\X' — X\). Fix p > 0. Then by breaking up the integrand over the sets 
{\X £ -X\ < p) and {\X e -X\ > p}, 



E\R\ < tttE 



A(e 



A(e 



IX 



x\h 

KpE\X € - X\ 2 



\X e -X\<p) 
K 



\X e - X\ 2 I(\X e - X\ > p) 



A(e) 



A(6 



-E 



X e - X\ 2 I(\X' - X\ > p) 
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The second term tends to zero as e — > by condition (iii) ; condition (ii) implies that the 
first is bounded by CKp for a constant C depending on k and on the distribution of X. It 
follows that 

1 



lim 



o A(e) 



E\R\ = 0. 



For the first two terms of ([13 
(14) 



lim 



0A(6 



-E 



lim 



1 



1 



X,Vh{X)) + |(X e - X) T (Eessh(X))(X e - X) 



o A(6 



-E 



[E[(X e -X)\X],Vh(X)) + ~( 



(X e - X)(X e - X) T \X ,Bessh(X) 



H.S. 



E [- (X, Vh(X)) + Ah(X) + (E[F\X], Hess h(X)) HS ] 
Eg(X) -Eg(Z) +E [<E [F\X] , Hess h(X)) HS ] , 



where conditions (i) and (ii) together with the boundedness of V/i and Hess h are used to 
get the third line and the definition of h = U Q g is used to get the fourth line. We have thus 
shown that 

(15) E[g(X)-g{Z)] = —E (F, Hess h(X)) HS . 

The result now follows immediately by applying the Cauchy-Schwarz inequality to ()15p and 
then the bound 1 1 Hess /i(x)|| ij-.s. < M\(g) from Lemma[3](i) 

□ 



X 



0, 



Remarks. 

(i) It is easy to see that if 

(iii') lim e _ ^E|X e 

then condition (iii) of the theorem holds. This is what is done in the applications 
below. 

(ii) As in Theorem HI the condition (i) implies that EX = and it follows from (i) and 
(ii) that 

EF = EXX T - a 2 F, 
the covariance matrix of X should thus not be far from a 2 I. 

Theorem [5] has the following corollary for complex random vectors. 

Corollary 6. Let W be a random vector in C k and for each e > let W e be a random 
vector such that L{W) = L{W e ), with the property that lim e ^ W e = W almost surely. Let 
Z = (Zi, . . . , Zfi) be a standard complex Gaussian random vector; i.e., with covariance matrix 
of the corresponding random vector in M. 2k given by \hk- Suppose there is a function A(e) 
and complex k x k random matrices T = [7^] and A = [Xy] such that 



A(e) 



E [(W e - W)\W] 



-W. 
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(ii) 



——E[(W e -W)(W e -W)*\W]^I k + E[T\w] . 

ZAy€) e— >0 



111 



1 



2A(e 

(iv) For each p > 0, 



-E [(W e - W)(W £ - W) T \W] E [h\W] . 



lim— -E 

e->o A(e) 



\W e - W\l{\W e -W\ 2 > p) 



0. 



Then 



d w (W,Z)<E\\T\\ H . s .+E\\k\\ 

U.S.- 

Proof. Identifying C fc with IR 2fc , W satisfies the conditions of Theorem [5] with a 2 



\ and F 



given as a k x k matrix of 2 x 2 blocks, with the i-jth block equal to 

1 

2 



Re(7y + Xij) Im(Aij - 7^) 
Im(Aij + lij) Re(7 ii - Ay) 



Thus ||F||^. = i(||r||^. + ||A||^ 5 .) and 

1 



E F bj-.s. < 



v/2 



Eiin 



EIIAI 



if .5. 



□ 



3. Examples using Theorem H 

3.1. A basic central limit theorem. As a simple illustration of the use of Theorem HJ 
we derive error bounds in the classical multivariate CLT for sums of independent random 
vectors. While the question of error bounds in the univariate CLT was settled long ago, the 
optimal bounds in the multivariate case are still unknown and much work has been done 
in this direction. One important contribution was made by Gotze [23], who used Stein's 
method in conjunction with induction. To the best of our knowledge, the most recent results 
are due to V. Bentkus [10], where one can also find extensive pointers to the literature. 

Suppose Y is a random vector in M. k with mean zero and identity covariance. Let W be 
the normalized sum of n i.i.d. copies of Y. Gotze [23] and Bentkus [10] both give bounds 
on quantities like A n = supj gyl |E/(W) — E/(Z)|, where Z = (Zx,...,Zf.) is a standard 

dimensional normal random vector and A is any collection of functions satisfying certain 
properties. For example, when A is the class of indicator functions of convex sets, Bentkus 
gets A n < iOOk^^n-^ElYl 3 , improving on Gotze's earlier bound which has a coefficient of 
k 1 / 2 rather than k 1/A . Note that E|y| 3 = 0(k 3 / 2 ). 

Theorem 131 allows us to easily obtain uniform bounds on \Eg(S n ) — Eg(Z) \ for large classes 
of smooth functions. 

Theorem 7. Let {li}™ =1 be a set of independent, identically distributed random vectors in 
M. k . Assume that the Y{ are such that 

E(Yi) = 0, EO^Yf ) = 4. 
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Let W=^ Zti Y i- Then for any g e C 2 D , 
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\Eg(W)-Eg(Z)\ < 



MM 
2v^ 



2vr 



/ E\Y l \^-k + ^M 2 {g)E\Y 1 \ 6 . 



Proof. To apply Theorem HI make an exchangeable pair (W, W) as follows. For each i, let 
Xi be an independent copy of Y, and let / be a uniform random variable in {1, . . . ,n}, 
independent of everything. Define W by 



Then 



n \/n 



E [W -W\W]= -j=E [Xj - Y^W] 
1 n 1 



i=l 



where the independence of Xi and W has been used in the last line. Thus condition [8] of 
Theorem H] holds with A = -. 

1 — 1 n 

It remains to check condition 2 and bound the Ey. Write Yi = (Y^, . . . For 1 < 

3,£<k, 



71 

E je = —E [{W' 3 - W^W't - W t )\W] - 5 



n 

±^\XiXl-XiY?-XtYi + YiY*\W}- 5 

i=l 
1 - 



i=l 



by the independence of the Xi and the Yj. Thus 



EE 



1 



Jt 4n 2 

< — E 
- An 2 



E E 



£(>7tf - s jt ) 
i=i 

Low - 



,i=i 



i n 

i=i 

1 E [Y/ Y/ — Sji\ 2 
E (YlY(f-5 jt 



An 
l 

An 
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where the independence of the Yj has been used to get the third line. It follows that 

< 



EE 



H.S. 



E||£|l!. s . < 



k. 



It remains to bound the second term of Theorem |H 



-E\W' 



W\ 3 = —=E\Xj 

In 1 



—=E\Xi 

<n 



Yr 



< -=e (i^i 3 + 3|Xi| 2 |Yi| + 3|Yi| 2 |Xi| + |1^| 3 ) . 

\/n 



Applying Holder's inequality with p = § and q = 3 



EpCiHyil < (ElXil 3 ) 273 (ElY^Y^ =E\Yi 



|3\ 1/3 



It follows that 



-E\W 
A 



W\ 3 < 



8E|Yi| 



Together with Theorem HI this finishes the proof. 
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□ 



4. Examples using Theorem E] 

4.1. Rank A; projections of spherically symmetric measures on W l . Consider a ran- 
dom vector Fel™ whose distribution is spherically symmetric; i.e., if U is a fixed orthogonal 
matrix, then the distribution of Y is the same as the distribution of UY. Assume that Y 
is normalized such that EY X 2 = 1. Note that the spherical symmetry then implies that 
EYY T = I n . Assume further that there is a constant a (independent of n) so that 

(16) Var(|Y| 2 ) < a. 

For k fixed, let denote the orthogonal projection of W 1 onto the span of the first k standard 
basis vectors. In this section, Theorem is applied to show that Pk{Y) = (Y . . . , Y*.) is 
approximately distributed as a standard A;-dimensional Gaussian random vector if k = o(n). 
That EPk(Y)Pk(Y) T = 1^ is immediate from the symmetry and normalization, as above. 
This example is closely related to the following result of Diaconis and Freedman in |17j . 

Theorem 8 (Diaconis- Freedman) . Let Z±, . . . , Z n be independent standard Gaussian random 
variables and let be the law of (aZi, . . . , aZ k ). For a probability \i on [0, 00), define 
by 



P 



Let Y = (Yi, . . . , Y n ) G M n be a spherically symmetric random vector, and let P*. be the law 
of (Yi, . . . , Yfc). Then there is a probability measure \i on [0, 00) such that for 1 < k < n — A, 

n — k — 6 

Furthermore, the mixing measure /i can be taken to be the law of A=[Y\. 



14 



SOURAV CHATTERJEE AND ELIZABETH MECKES 



In some cases, the explicit form given in Theorem [8] for the mixing measure has allowed 
the theorem to be used to prove central limit theorems of interest in convex geometry; see 
[IT] and [26] . Theorem \TU\ below says that the variance bound ( |T6l) is sufficient to show 
that the mixing measure of Theorem [8] can be taken to be a point mass. In fact, it is not 
too difficult to obtain the total variation analog of Theorem [10] directly from the Diaconis- 
Freedman result and ([TBI) ; however, the Stein's method proof given below is considerably 
simpler than the direct proof given in [IT] . The rates obtained are of the same order, though 
the rate obtained by Diaconis and Freedman is in the total variation distance, whereas the 
rate below is in the Wasserstein distance. 



To apply Theorem [SJ construct a family of exchangeable pairs as follows. For e > fixed, 



let 



vT 



In + 



vT 



'n-2 







n-2> 



where 5 is a deterministic constant and 5 = 0(e 4 ). Let U be a Haar-distributed n x n 
random orthogonal matrix, independent of Y, and let Y e = (UA e U T ) Y. Thus Y t is a small 
random rotation of Y. In what follows, Theorem [5J is applied to the exchangeable pair 
(P k (Y),P k (Y e )). 

Let K be the k x 2 matrix made of the first two columns of U and — i o ' ^ e ^ ne 
Q := KC2K 7 \ Then by the construction of Y e , 

(17) P k {Y e ) - P k (Y) = e [- (| + rt) P k KK T + P k Q~\ Y, 

and = 0(e 3 ). 

To check the conditions of Theorem [SI the following lemma is needed; see [52], Lemma 3.3 
and Theorem 1.6 for a detailed proof. 

Lemma 9. If U = [uij\^- =1 is an orthogonal matrix distributed according to Haar measure, 
~\ is non-zero if and only if the number of entries from each row and from each 



then E 



column is even. Second and fourth-degree moments are as follows: 

(i) For all 

E K] 

(ii) For all i, j, r, s, a, (3, A, fi, 



1 

n 



(n - l)n(n + 2) 



+ 



n 



(n - l)n(n + 2) 



8i r 8 a x8j S 8/3fi + 8i a 5 r \5jp5 s ^ + 5i\5 ra 5j^5 s p 
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(iii) For the matrix Q = [<?ij]™- =1 defined as above, qij = unUj2 — u^Uji. For all i,j, £,p, 

2 

E [qijqip] = —— ~r [Sie5j P — 5i p 5jt\ . 



n[n 



By the lemma, E[KK T ] = \l and E[Q] = 0, and so 



lim— E 



[P k (Y e ) - P k {Y)) P k {Y) 



-Pk(Y); 



condition (i) of Theorem [5] thus holds with A(e 



Fix i,j < k. By (TI 

Ti 

lim— E 

e-o 2e 2 



[P k (Y e ) - P k {YMP k (Y e ) - P k {Y)) : 



Y 



n 



-E [(P k QYUP k QY) 3 \Y] 



-E 
2 



YiY m ( lii. ( lii 



L l.ra 



Y 



[n-D 
1 



-E 



.t,m 



Thus 



(n-1) 



[(E[|y| 



[n 



(n-1) 

-l)\P k (Y)])-I k -P k (Y)P k (Y) T }. 



Now, 

by assumption, and 

e|e [|y| 2 - ( 

so applying Theorem [5] gives: 
Theorem 10. With notation as above, 



E\\P k (Y)P k (Y) T \\ H . s . = E \P k (Y)\l = k 



n 



l)\Pk(Y)]\ < y/E+1, 



d w (P k (Y),Z)< 



n — 1 



4.2. Rank k projections of Haar measure on O n . 

A theme in studying random matrices from the compact classical matrix groups is that 
these matrices are in many ways (though not all ways) similar to Gaussian random matrices. 
For example, it was shown in [1] that if M is a random matrix in the orthogonal group <D n 
distributed according to Haar measure, then 

sup |P(Tr (AM) < x) - | -> 

A : Tr (AA T )=n 
— oo<x<oo 

as n — » oo. In [33], this result was refined to include a rate of convergence (in total variation) 
of W = Tr (AM) to a standard Gaussian random variable, depending only on the value 
of Tr (AA T ). That is, rank one projections of Haar measure on n are uniformly close to 
Gaussian, and rank one projections of Gaussian random matrices are exactly Gaussian. 
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A natural question is whether rank k projections of Haar measure on n are close, in 
some sense, to multivariate Gaussian distributions, and if so, how large k can be. This 
is a more refined comparison of the type mentioned above, since the distributions of all 
projections of any rank of Gaussian matrices are Gaussian. In the remarkable recent work 
[25] , Tiefeng Jiang has shown that the entries of any p n x q n submatrix of an n x n random 
orthogonal matrix are close to i.i.d. Gaussians in total variation distance whenever p n = 
o(y/n) and q n = o(y/n), and that these orders of p n and q n are best possible. This improved 
an earlier result of Diaconis, Eaton, and Lauritson [16], which proved the result in the case 
of p n = o(n 1 ^ 3 ) and q n = o{n 1 ^). As this article was in preparation, Benoit Collins and 
Michael Stolz [15] proved that for r fixed, Ai , . . . , A r deterministic parameter matrices, and 
M a uniformly distributed element of a classical compact symmetric space (represented as 
a space of matrices), the random vector (Tt (AiM), . . . ,Tr (A r M)) converges weakly to a 
Gaussian random vector, as the dimension of the space tends to infinity. Their work in 
particular covers the cases of M a Haar-distributed random orthogonal or unitary matrix, 
but goes farther to consider more general homogeneous spaces. 

In this section, it is shown that rank k projections of Haar measure on n are close in 
Wasserstein distance to Gaussian for k = o(n). This in particular recovers Jiang's result (in 
Wasserstein distance), but is more general in that it is uniform over all rank k projections, 
and not just those having the special form of truncation to a sub-matrix. The theorem also 
strengthens the result of Collins and Stolz, in the case that M is a random element of n . 

Theorem 11. Let B\, . . . , Bk be linearly independent n x n matrices (i.e. the only linear 
combination of them which is equal to the zero matrix has all coefficients equal to zero ) over 
K. such that Tr (BiBj) = n for each i. Let bij = Tr (BiBj). Let M be a random orthogonal 
matrix and let 

X = (Tr (B 1 M), Tr (B 2 M), . . . , Tr (B k M)) G R k . 
Let Y = ( Yi , . . . , Ifc) be a random vector whose components have the standard Gaussian 
distribution, with covariance matrix C := i (hij)^ =1 . Then for n > 2, 



d w (X,Y) < 



k^/nci P 



n 



Remark. Lemma O and an easy computation show that for all i,j, 

E[Tr (BiM)Tr (BjM)] = ^ (B t , Bj) , 
thus the matrix C above is also the covariance matrix of X. 

It is shown below that Theorem [TT] follows fairly easily from the following special case. 

Theorem 12. Let Ai, . . . , A^ benxn matrices overK. satisfying Tr (AiAj) = ndij] fori ^ j , 
Ai and Aj are orthogonal with respect to the Hilbert- Schmidt inner product. Let M be a ran- 
dom orthogonal matrix, and consider the vector X = (Tr (A\M), Tr (A2M), . . . , Tr (AkM)) £ 
]R fc . Let Z = (Zi, . . . , Zk) be a random vector whose components are independent standard 
normal random variables. Then for n > 2, 



\Ef(X)-Ef(Z) 
where Mi(f) is the Lipschitz constant of f . 



V^M!(/)fc 



n 
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Example. Let M be a random nxn orthogonal matrix, and let < a\ < a 2 < . . . < a*. = n. 
For each 1 < i < n, let 

has in the first a, diagonal entries and zeros everywhere else. If i < j, then 



(-Bj, Bj) 



HS 



n. 



in particular, (B il B i ) HS = n. The B^ are linearly independent w.r.t. 



the Hilbert-Schmidt inner product since the a; are all distinct, so to apply Theorem [TTl we 
have only to bound the eigenvalues of the matrix Lf^\ . But this is easy, since 



max(i,j) 



|A| < A /E 



min(i,j) 



< k for all eigenvalues A (see, e.g., [H]). It now follows from Theorem 

k 

and 



min(i,j) 



o max ( ii;) -) 

[TTl that if y is a vector of standard normals with covariance matrix (y^~7 
X = (Tr (BiM), . . . , Tr (B k M)), then 

sup Ef(X)-Ef(Y) \<V -. 

Proofs 

Proof of Theorem^!!} from Theorem^^ Perform the Gram-Schmidt algorithm on the matri- 
ces {Bi, . . . , Bk} with respect to the Hilbert-Schmidt inner product (C, D) = Tr {CD T ) to 
get matrices {Ai, . . . ,Af.} which are mutually orthogonal and have H-S norm y/n. Denote 
the matrix which takes the B J s to the A's by D~ l for D = [dij\. the matrix is invertible 
since the B's are linearly independent. Now by assumption, 

bij = (Bi,Bj) 

J2duA h J2 d ipA 



n 



2^ dild 



Jl- 



Thus DD T = C = i (6y)J. =1 . 

Now, let / : R k -+ R with M x (f) < 1. Define h : M fe - 
M^/i) < ||£>|| op < ^/||DD T ||op. By Theorem [El 

\Eh(Tr(A 1 M),...,Tr(A k M))-Eh(Z)\ < 



by = f(Dx). Then 



op 



n — 1 



for Z a standard Gaussian random vector in R k . But D(Tr (AjM), . . . , Tr (A^M)) = 
(Tr (BiM), . . . , Tr (S^M)) and DZ has standard normal components with covariance matrix 

C = \ (%)-, , • □ 



Proof of Theorem^^ Make an exchangeable pair (M, M e ) as before; let A e be the rotation 

H_ i e 



vT 



— e 



vT 



n-2 



4 + 



— e 



vT 



1 







n-2, 
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let U be a Haar-distributed random orthogonal matrix, independent of M, and let 

M e = UAU T M. 

Let X e = (Tr (A 1 M e ), . . . , Tr (A k M e )). 

" 1 
-1 



As in section 14.11 define K to be the first two columns of U and C 2 
Q = KC 2 K T . Then 



and let 



18) 



M f -M = e 



— e 



0(e 3 ) KK T + Q 



M. 



It follows from Lemma [9] that E[KK T ] = |J and E[Q] =0, thus 
lim-E[(X £ -X),|M] 

= lim — E [Tr [A*(M £ - M)] \M] 



n 



lim 



0(e 4 ) j E [Tr (^ifK T M)|M] + eE [Tr (AiQM)\M] 



lim- -- + 0(e 4 ) -X 
e— e 2 V 2 J n 



Condition (i) of Theorem [5] is thus satisfied with A(e) = — . The random matrix F is 
computed as follows. For notational convenience, write A{ = A = (a pq ) and Aj — B — (b a p). 
By (HI, 

n 



lim— E 



(X e - X)i(X e - X), 



M 



n 



-E [Tr (AQM)Tr (BQM) \ M] 



(19) 



^E 
2 



-E 
2 



M 



^ a prn rp m ia [- ( —_ 



(" 




1) 




1 




(" 




1) 




1 





-E [(A, B) HS - Tr (MAMB)] 



in 



— [ndij - Tr {MAMB)] . 



Thus 



-E 



Sij - Tr (AMAjM) 



1 k 






X 


- i,3 =1 





(n-1) 

Claim: If n > 2, then E [Tr (AiMAjM) - (%] 2 < 2 for all % and j. 
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The claim gives that, for n > 2, 

nF\\ H .s. 



V2k 



n\F\\ 2 H .s. < 



thus completing the proof. 

To prove the claim, first observe that Lemma [9] implies 

E[Tr (AMAjM)] = i (A u A,) = $ 



Again writing A{ = A and Aj = B, applying Lemma [9] gives, (ii) 



E[Tr (AMBM)f 



E 



£ a sp a, a b qr b px m Pq m rs m a ,m x , 
2 



p,q,r,s 



(n - l)n(n + 2) 

71+1 



+ 



[Tr (A T AB T B) + Tr (AB T AB T ) + Tr (AA T BB T )} 
[2 (A,B) H _ S + \\A\\ 2 HS ]\B\\ 2 HS ] 



(n - l)n(n + 2) 

Now, as the Hilbert-Schmidt norm is submultiplicative (see [9], page 94), 



Tr (A T AB 1 B) < \\A r A\\ H . s .\\B T B\\ H . s . < \\A\\^ S ]\B\\^ 



T: 



n 



and the other two summands of the first line are bounded by n 2 in the same way. Also, 



2 (A B) H ^ 



H.S. 



\B\ 



H.S. 



nHl + 25; 



Thus 



E [Tr (AiMAjM) - 5, 



'J J 



< 



-6n 2 + (n + l)n 2 (l + 2%) - (n - l)n(n + 2)^ 
(n - l)n(n + 2) 



< 2. 



□ 



4.3. Complex-linear functions of random unitary matrices. 



In this section, we consider Haar-distributed random matrices in li n . As discussed in the 
previous section, a general theme in studying random matrices from the classical compact 
matrix groups has been to compare to the corresponding Gaussian distribution. In particular, 
it was shown in [1] that if M = T + iA is a random nxn unitary matrix and A and B are fixed 
real diagonal matrices with Tr (AA T ) = Tr (BB T ) = n, then Tr (^4r) + zTr (BA) converges 
in distribution to standard complex normal. This implies in particular that Re(Tr(74M)) 
converges in distribution to 9T (0, |) . A total variation rate of convergence for this last 
statement was obtained in [23], giving as an easy consequence the weak-star convergence 
of the random variable W = Tr (AM) to standard complex normal, for A an n x n matrix 
over C with (AA*) = n. The approaches used in [T] and [53] are somewhat awkward, 
partly due to the fact that the limiting behavior of W is a multivariate question. In this 
section, Corollary [6] is applied to prove the analogous result to Theorem [T2l for complex-rank 
k projections of Haar measure on the space of random unitary matrices. As in the previous 
section, this result recovers and strengthens the result of Collins and Stolz |15j . in the case 
that M is a Haar-distributed unitary matrix. 



20 



SOURAV CHATTERJEE AND ELIZABETH MECKES 



Theorem 13. Let M e U n be distributed according to Haar measure, and let {Aj}^ =1 be fixed 
nxn matrices over C such that Tr {AiA*) = n5ij. Let W{M) = (Tr (A 1 M), . . . , Tr (A k M)) 
and let Z be a standard complex Gaussian random vector in C k . Then there is a universal 
constant c such that 

d w (W,Z) < — . 

n 

Remark: The constant c given by the proof is asymptotically equal to a/2; for n > 4, c 
can be taken to be 3. 

For the proof, the following lemma is needed. See [32], Lemma 3.5 for a detailed proof. 

Lemma 14. Let H = [fry] — E 1i n be distributed according to Haar measure. Then the 
expected value of a product of entries of H and their conjugates is non-zero only when there 
are the same number of entries as conjugates of entries from each row and from each column. 
Second- and fourth-degree moments are as follows. 

(i) For all i,j, 



(ii) For all i, j, r, s, a, f3, A, ji, 



^[hijhrshaphx^ 



1 



(n- l)(n + 1) 



fiia0~r\0~j/30~ Sf j, + Six5 ra 5j ^5 s f3 
1 



(n — \)n{n + 1) 



m 



E[(hnhj 2 - h i2 h jl )(h rl h s2 - h r2 h sl 



(n - l)(n + 1) 



Ois^jr + 



O — l)n(n + 1 



6ijS rs . 



Proof of Theorem [T31 The theorem is proved as an application of Corollary [HI similarly to 
the proof of Theorem [T2] via Theorem Construct a family of pairs (W, W e ) analogously to 
what was done in the orthogonal case: let U G lt„ be a random unitary matrix, independent 
of M, and let M f = UA f U*M, where as before 



A r 



vr 



'n-2> 



thus M € is a small random rotation of M. Let W e = W(M € ); (W,W e ) is exchangeable by 
construction. 

As in the previous sections, let I 2 be the 2x2 identity matrix, K the n x 2 matrix made 
from the first two columns of U = [ity^ •> and let 



Co 



1 
-1 



Define the matrix Q = KC 2 K*. Then 



M t = M + K[(Vl - e 2 - l)I 2 + eC 2 ]K*M, 
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and 



- + 0(e 4 ) ) Tr (AiKK*M) + eTr (AQM). 



(20) Tr (AiM e ) - Tr (^M) 

It follows from Lemma QJ that E[KK*] = \l and E[Q] = 0, thus 
(21) 



71 

lim-E [Tr (i,M f ) - Tr (AM)\M] = -Tr (AM), 

and the first condition of Corollary El holds with A(e) = — . 

Let Ai =: A = [a pg \ and Aj =: B = \b a p\ ; by (1201) and Lemma H 

lim— E [(W € - W)i{W € - W)j\W] 

71 

= -E [(Tr (AQM)) (Tr (BQM)) \W] 



n 



(22) 



E 



22 a, pq m rp b a(3 m^ a (u gl u r2 - u q2 u r i)(u P iU l2 - u p2 u 1 i] 

.P,q,r,a,/3,n 



22 CLpq^qphapmpa - n } j a pq b a pmp p m qa 

p,q,a,f3 p,q,a(3 



W 



(n - l)(n + 1) 
1 

(n- l)(n + 1) 



[Tr (AM)Tt (BM) - nTr (AMBM)} . 



Similarly, one can use part (iii) of Lemma HH with the roles of r and s reversed to get 
(23) 



lim— E[(W e - W)i(W e - W)j\W] = -E Tr (AQM)Tr (BQM)\W 



E 



^ a P qm rp b a pm lol (u q iu r 2 - u q2 u r i)(u P iu l2 - u^u^) 

.p,5,r,a,/3,7 



w 



1 



(n-l)(r» + l) -ElE^l IE 

r- - Tr (AM)Ty (BM) 

(n- l)(n+l) L ,J v ; v ; 

1 



0'pqb a i3'>Tl. qp m^ a 



(n - l)(n+ 1) L 



5ij — Tr (AM)Tr (BM) 



where the fact that M is unitary and the assumption Tr(AiA*) = n5ij have been used to 
get the second to last line. 
One can thus take 



Sjj - Tr (AjM)Ti (AjM) 
(n - l)(n + 1) 
By the Cauchy-Schwarz inequality, 

Eiin 



A 



Tr (AM)Tr (AjM) - nTr (AiMAjM) 



H.S. 



ij 



< 



(n- l)(n+ 1) 



22 

and 
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1 



(n- l) 2 (n + l) 2 



E 



Stj - 2 Re(Tr (AjM)Tr (A,M)) + |Tr (AjM)Tr (A,-M)|' 



Now, 



E|Tr (AjM)Tr (A,-M) | < y E|Tr (yljM)| 2 E|Tr (AjM)\ 2 = 1 
by the normalization of the matrices A{. Again writing A = A\ and B = Aj, 
E|Tr (AM)Tr (BM)\ 2 



p,q,r,s 



(n - l)(n + 1) 
1 

(n - l)(n + 1) 



a 7TL sr TTT'^X 



Tr (A4*)Tr (SB*) + (Tr (AB*)) - -Tr (yL4*££*) - -Tr (A*AB*B) 

n n 



n 2 (l + 5ij) - -Tr {AA*BB*) - -Tr (A*AB*B) 



n 



n 



where Lemma [14] has been used to get the third line and the normalization and orthogonality 
conditions on the Ai have been used to get the last line. Now, 

|Tr (AA*BB*)\ < \\AA*\\ H , S .\\BB*\\ H _ S . < \\A\\ H .s.\\A*\\ H . s .\\B\\ H . s .\\B*\\ H . s . = n 2 - 

the first inequality is just the Cauchy-Schwarz inequality for the Hilbert-Schmidt inner prod- 
uct and the second is due to the submultiplicativity of the Hilbert-Schmidt norm (see [9], 
page 94). It now follows that 



n^ 3 \ 2 < 

and thus 
(24) 



[n- l) 2 (n + 1 



6ij + 2 



n 2 (l + Sij) + 2n 
(n- l)(n+ 1) 



< 



(n- l) 2 (n + iy 



5 + 



n — 1 



Eiin 



H.S. 



< 



k 



(n - l)(n + 1) 
Taking a similar approach to bounding E||A||#.s., 
(25) 



n — 1 



TR-|\ |2 _ i 

1 * jl (n-l) 2 (n + l) 2 



E 



|Tr (AjM)Tr (A,-M)| - 2nRe(Tr (^M)Tr (AjM)Tr (AiMAjM)) 



+ n 2 \Tr(AiMAjM)\' 



It has already been shown that 



E|Tr (AjM)Tr (AjM)\ 2 < 



n 2 (l + Sij) + 2n 



(n - l)(n + 1) 



< 2 



n — 1 
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One can use Lemma [TH to compute the other two terms similarly: 
E [ir (AM)Tr (BM)Tr (AMBM) 
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^ a P ga a( 3b rs bx fl E[rn qp m sr m f3X rn m ] 

s 

1 



p,q,r,s 



(n - l)(n + 1) 
1 

(n - l)(n + 1) 



Tr (AA*BB*) + Tr (A*AB*B) - -Tr (AA*)Ti (BB*) - -(Tr (AB*)) 2 

lb TTj 

[Tr (AA*BB*) + Tr (A*AB*B) - n(l + , 



thus 



E 



In Re Tr (A;M)Tr (AjM)Tr (AiMAjM) 



< 



4n 3 + 2n(l + 5^- 
(n - l)(n + 1) 



< 4n + 



n—V 



and 

E|Tr( AMBM) | 2 



s 
A' 

1 



p,q,r,s 



(n - l)(n + 1) 
1 

(n - l)(n + 1) 



Tr {AA*)Ti (BB*) + (Tr (AB*)) - -Tr (AA*BB*) - -Tr 

n n 

n 2 (l + <y i7 -) - ^Tr (AA*BB*) - -Tr (AMS*5) 
n n 



thus 



n 2 E\Tr(A i MA j M)\ 2 < 



n 4 (l + Sij) + 2n 
(n - l)(n + 1) 



< In 1 + 



2n 2 
n — 1 



Using these three bounds in (1251) yields 



2(n 2 + 5) 



(n-l)V (n-l)(n + l) 2 ' 



□ 
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